Design of Toy Proteins Capable to Rearrange Conformations in a Mechanical Fashion 
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We design toy protein mimicking a machine-like function of an enzyme. Using an insight gained 
by the study of conformation space of compact lattice polymers, we demonstrate the possibility of a 
large scale conformational rearrangement which occurs (i) without opening a compact state, and (ii) 
along a linear (one-dimensional) path. We also demonstrate the possibility to extend sequence design 
method such that it yields a "collective funnel" landscape in which the toy protein (computationally) 
folds into the valley with rearrangement path at its bottom. Energies of the states along the path 
can be designed to be about equal, allowing for diffusion along the path. They can also be designed 
to provide for a significant bias in one certain direction. Together with a toy ligand molecule, 
our "enzimatic" machine can perform the entire cycle, including conformational relaxation in one 
direction upon ligand binding and conformational relaxation in the opposite direction upon ligand 
release. This model, however schematic, should be useful as a test ground for phenomenological 
theories of machine-like properties of enzymes. 
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I. INTRODUCTION 
A. The problem and the context 

Theoretical progress in understanding proteins in the 
recent years was concentrated on folding, along with con- 
nected questions of sequence design and evolution (see 
bookl!l and references therein for the recent overview). 
Folding attracts theorists not only because it is so im- 
portant for fundamental biology and for pharmaceutical 
industry, but also because it is a robust universal phe- 
nomenon. Vast number of proteins exhibit the ability to 
fold, and there is a widely recognized necessity to un- 
derstand the physical principle behind the selection of 
sequences capable of fast and reliable folding. 

In our opinion, there is one more aspect of proteins 
which is equally robust and appealing for theoretical 
analysis in terms of some minimal model. We mean here 
the ability of many proteins to function in a machine- 
like fashion through ordered conformational rearrange- 
ment. This is most obvious for motor proteins whose 
function is directly related to certain mechanical (con- 
formational) movementsB. This is also clear for ion chan- 
nels whose function is to mechanically move molecules 
(or ions) from one place to the other (e.g., across the 
membrane). Although less obvious, conformational mo- 
tions appear to be also very much at play in proteins 
whose function is purely electronic, such as, e.g., electron 
transfer in bioenergetics or catalysis of a chemical reac« 
tion. This latter point was first formulated by McClareEl 
and independently by BlumenfeldcL More recently, it was 
extensively discussed in the bookQ. Somewhat different 
viewpoint on this subject was also recently presented in 
the bookfl. n 

The new experimental data by H. Gruler et alEI support 
the idea of slow conformational relaxation being impor- 
tant for the operation of enzimatic molecular machines. 
More detailed phenomenological models of enzyme oper- 
ation based on the concept of conformational relaxation 



as a biased diffusion process have been successfully im- 
plemented to interpret the experimental results! There 
is now the Data Base of conformational movements in 
proteinsO. 

We emphasize two general properties of function- 
related conformational movements in proteins. First, 
they occur without significant opening of the dense glob- 
ular structure. Viewed in the context of contemporary 
folding theories, this property seems quite exciting. In- 
deed, as native globule is pretty dense, frequently-mod- 
eled as a maximally compact self-avoiding polymeii!3, the 
inside movements may be expected to be strongly sup- 
pressed. Counterargument to this suggests that in fact 
real protein globule does have certain voids and is not 
absolutely densetil. Nevertheless, the density of a typical 
protein globule is similar to that of a polymer melt, for 
which reason it may be expected to be extremely viscous 
if not altogether glassy. The observation of significant 
conformational movements inside such a dense polymeric 
conglomerate challenges theory to offer an explanation. 

Second general property we would like to mention is 
the presence of some preferred collective degree of free- 
dom - which is almost a synonym to a functioning de- 
vice. For instance, enzymes work in cycles, and each 
cycle means a turn around some loop in the conforma- 
tional space. For channel-forming protein, a part of this 
loop corresponds to a transported ion moving from one 
place to the other, the rest of the loop corresponds to 
the protein coming back. According tq-,the arguments 
developed a long time ago (see the bookfil and references 
therein), it is important that there is only one collective 
degree of freedom along the loop of function (which, of 
course, does not rule out "transverse" fluctuations^]) . 

Importantly, machine-like function is realized well 
away from equilibrium conditions. That means, there 
is no detailed balance, and the system moves along the 
loop in one direction and not in the other - there is no, 
and should not be, detailed balance. This, however, does 
not rule out the possibility that some parts of the cycle 
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may present themselves as being the motions along the 
same path in opposite directions, like, e.g., a piston mov- 
ing up and down in a steam engine. We shall return to 
this point later. 

The notion of preferred function-related degree of free- 
dom may be compared in some respects to the con- 
cept of reaction coordinate much discussed in folding 
studiesE3o. In both cases, the presence of transverse 
fluctuations is important. In case of folding, this gives 
rise to the understanding that, e.g., transition "state" 
is not a microstate, determined to aipmic details, but 
rather an ensemble of (micro)stateaij. The preferred 
functional degree of freedom must be considered in pretty 
much the same way. 

It is an exciting question whether these two collective 
degrees of freedom, relevant for folding and function, are 
connected to one another. One could even speculate that 
they may be the same, or similar to some approximation. 
As of today, this question remains open. However, we 
note in passing that turn-around times reported injaaod- 
ern single molecule experiments on enzymes such ascj, on 
the order of a fraction of a millisecond, are not drastically 
different from typical folding times. 

In the present paper, our goal is modest, but two-fold. 
First, we want to see, at least for the simplest model, 
how one can imagine a collective degree of freedom allow- 
ing orderly motion without opening the compact globule. 
Second, we want to design such a sequence that the glob- 
ule energy changes in some desirable fashion while mov- 
ing along the preferred coordinate. This way, we want to 
mimic a molecular spring. 

Thus, our work is organized as follows. We first de- 
scribe the model and formulate our problem in a more 
explicit way for that model. We then discuss the possi- 
bility to design the native state conformation, or, better 
to say, the ensemble of nearly-native state conformations, 
in such a way that they can realize the one-dimensional 
motion within an almost compact globule. After that, we 
design the sequence capable to fold into such a function- 
ing conformation. At the end we study some properties 
of thus designed toy protein. 



B. A digression: from spherical cow to lattice 
protein 

We shall work with the toy lattice model of protein. 
We understand that these words will run the emotions 
high and negative with many readers, and so we want 
to answer that from the very beginning. Everyone un- 
derstands that there are no lattices in biological world, 
but this argument itself, however obvious, is too cheap 
to prove the lattice models useless. Indeed, for example, 
everyone laughs at the famous anecdote about a spherical 
cow, but at the same time everyone tacitly agrees that the 
model of a spherical cow is useful, e.g., to understand the 
scaling laws relating-animal body mass and the rate of 
oxygen consumption^. Thus, the dispute about useful- 



ness, or the lack of one, for the lattice models in protein 
studies cannot be resolved on the level of philosophy, this 
is the question of specific purpose of certain studies. Not 
entering the details, there are questions for which lat- 
tice models are totally inappropriate (and may deserve 
laugh), there are some other questions for which using 
lattice models is legitimate. As we hope to prove by the 
results, our present paper belongs to the latter category. 

Thus, we use standard toy lattice model in which pro- 
tein is represented as a self-avoiding walk of the desir- 
able length. The protein changes its conformation by 
means of elementary moves, including corner flips, end 
flips, crankshafts, and null moveslia. The advantage of 
this move ..set is that the resulting system is known to 
be ergodidlj. For this model, polymer moves by making 
discrete succession of steps from one conformation to the 
next. Accordingly, the preferred degree of freedom must 
be associated with certain a linear (one-dimensional) suc- 
cession of conformations, in which every conformation 
may only move into either previous or next conformation 
of the same group. 



II. DESIGN OF CONFORMATION 



A. Space of conformations 



We stated above why the protein needs to have a se- 
lected degree of freedom for functional work. Here, we 
design lattice toy protein in such a way that its con- 
formations, while remain compact, can rearrange along 
one-dimensional linear path in conformational space. 

The concept of linear path can be easily explained if 
the conformational space of lattice protein is visualized 
as a graphcjOEj. In this representation, every confor- 
mation is denoted as a node of the graph, and two nodes 
are connected if and only if the transition between corre- 
sponding conformations is possible via single elementary 
move (see Figure [l]) . 

We are looking for linear paths in the conformational 
space graph (CSG). Obviously, linear path is the succes- 
sion of such nodes each of which is connected to exactly 
two other nodes. For swollen, non-compact polymer, a 
multitude of conformational motions is possible, the cor- 
responding nodes of the graph have very many connec- 
tions, and so none of the swollen conformations belong to 
the linear path. By constrast, in compact conformations 
conformational freedom of the polymer is very limited, 
and there is a hope to find linear paths. 

In order to find one-dimensionally-connected paths of 
compact conformations, we examined from this point of 
view the properties of CSG of the short lattice polymer. 
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FIG. 1: The conformational space of the lattice polymer can 
be visualized as a graph. The nodes represent conformations. 
Two nodes are connected if and only if the transition between 
them is possible via single elementary move. The linear rear- 
rangement pathway is shown in bold. 



B. Properties of the space of the compact 
conformations 

We build on the findings of the wor In that work, 
it was shown that placing the polymer inside a restricted 
size box makes the conformational space graph discon- 
nected, consisting of several disjoint pieces, or chambers. 
This finding was based on computer simulation of lat- 
tice polymers of various lengths N, each confined in the 
3x3x3 box on the lattice. For our present purposes, 
it is important to address another physical situation, in 
which N is fixed, while the degree of compactness of the 
chain may change. We achieve this by restricting the gy- 
ration radius of the chain R g and then looking at various 
specific values of R g . More specifically, this was done in 
the following manner. 

We consider lattice polymer of the length N = 18. We 
start from maximally compact conformation and allow it 
to make all elementary moves consistent with the chain 
self-exclusion, but possibly (and necessarily!) violating 
the compactness. We accept the conformation and place 
it as a node on the graph if and only if the chain gyra- 
tion radius in this conformation is less than the chosen 
threshold R g . All the accepted conformations are pic- 
tured on the graph as nodes and their connections with 
all other accepted conformations are established through 
exhaustive search. Then these new conformations again 
allowed to make elementary moves, new conformations 
are accepted if they do not exceed the same threshold R g , 
and the process repeats. As regards the limiting value of 
R g , we choose it experimentally, and it regulates both 
the compactness of the conformations and the number 
of conformations in the graph. The graph constructed 
in such a manner consists of several disjoint regionsEa, 
or chambers. That means, for two conformations, which 



belong to different chambers, there is no sequence of el- 
ementary moves, which transforms one of them into an- 
other without breaking the restriction on R g . Thus, the 
procedure must be repeated starting from different max- 
imally compact conformations to list all the chambers 
of the graph. The number of conformations in different 
chambers varies, reflecting the distribution of the clus- 
ters in the bond percolation problem (in this case, we 
deal with percolation in conformation spaceo) . Figure || 
shows the dependence of the number of the small cham- 
bers in the conformational space graph on the number 
of conformations locked in chamber. In terms of the un- 
derlying physical idea, this figure is similar to the result 
reported earlier in the workE3 for a different model. In 
the present model, we vara R g at the fixed number of 
monomers N. In the worked, the similar alleged percola- 
tion in the conformational space was controlled by chang- 
ing N while locking the polymer inside the 3x3x3 cube 
on the lattice, which of course implies fixed R g . 
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In (num. of conformations in chamber) 

FIG. 2: The dependence of the number of the small chambers 
in the CSG on the number of conformations locked in cham- 
ber. Two different CSGs were built limited by R g — 1.304 
(triangles) and R g = 1.305 (circles). For a comparison, max- 
imally compact conformations of lattice 18-mer have R g = 
1.2583. The conformational space of lattice 18-mer restricted 
by R g — 1.304 consist of 1094 small chambers. The "infinite" 
cluster incorporates 23536 compact conformations. 

Further, we established the connectivities of the nodes 
which belong to the largest chamber in the CSG. The dis- 
tribution of the connectivities of the nodes is shown in 
Figure [5]a, curve 1. It is compared with the distribution 
of the numbers of neighboring nodes for the same set of 
conformations, but not restricted with the i? g -condition 
(curve 2). Curve 2 is indistinguishable from the binomial 
distribution (which does not contradict th&idea that non- 
restricted CSG is a small- world networkEil) . The CSG 
built under the restriction on the values of R g of toy- 
protein conformations (curve 1) is significantly different, 
one can imagine this graph as a percolation cluster of the 
bond percolation problem on the, lattice with the topol- 
ogy of the small-world networked. For the small values 
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of R g (weakly connected cluster) the peak of the distri- 
bution corresponds to the graph nodes connected to only 
two neighbors. The sharp peak on the curve 1 corre- 
sponding to the poorly connected conformations can be 
easily explained. The change of the geometry of the voids 
in the bulk of the protein globule is possible only via small 
number of local moves, because the excluded volume ef- 
fect is very strong in compact conformations. Such subtle 
conformational moves do not affect significantly the value 
of the R g of the protein chain, whereas opening of some 
loop on the surface of the globule leads to the increase 
of the R g . On the other hand, the majority of the con- 
formational moves accessible to the unrestricted protein 
chain occurs on the surface. Accordingly, the limiting 
of R g from above forbids surface moves but does not re- 
strict the changes in the bulk of the globule. That is 
why for small limiting R g values the sharp peak of the 
distribution rises at the poorly connected conformations. 
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FIG. 3: (a) The distribution of the connectivities of the nodes 
within "infinite" cluster of the conformational space graph of 
the lattice 18-mer restricted by R g = 1.304 (triangles) com- 
pared with one for the non-restricted 18-mer (circles). The 
peak of the distribution corresponds to the nodes incorpo- 
rated into the linear paths, (b) The distribution of the linear 
paths in the "infinite" cluster by length. The longest path 
found consist of 9 elementary moves. 



Thus, the study of i? g -restricted conformations of the 
18-mer provides an example in which there is the peak of 
connectivity distribution which corresponds to the graph 
nodes with connectivity 2. These are desired conforma- 
tions forming linear paths. However, most of these linear 
paths are rather short and consist of only few elemen- 
tary moves. The distribution of the lengths of the paths 
is shown in Figure |^b, it is exponential distribution. 

Thus, our conclusion so far is this. Long linear paths 
exist, but they are not very common, they are rare. In 
this sense, the situation is similar to that of selection 
of sequences for toy proteins capable of foldingn. In se- 
quence selection case, the "good" sequences are exponen- 
tially rare among the random ones. The goal of sequence 
design, or selection, is to fish them out. Similarly, our 
goal now is to identify the rare conformations which are 
connected by long linear paths in the conformation space. 
In order to do that, we need to understand better the 
local geometry of conformations belonging to the linear 
paths. It cannot be done for the chain length of N = 18 
which is too small. It is also small compared with typical 
protein lengths, about one hundred or more in average. 
Thus, we repeated the same procedure of CSG mapping 
for the toy-protein of the length N — 125. Of course, in 
this case no exhaustive enumeration of conformations is 
possible, and so we performed random sampling instead. 



We started from the limiting value R g 



2.44949, 



which is equal to the gyration radius of the maximally 
compact conformation. In this case, no conformational 
movement is possible, and, therefore, CSG consists of as 
many disconnected nodes as the number of compact con- 
formations. We now increase R g by a very small amount. 
By several attempts, we choose the step of R g increment 
in such a way that after one step conformation space 
remains disintegrated, non-ergodic, consisting of discon- 
nected chamberstJ most of which contain only few con- 
formations. We then increase R g step by step, every time 
mapping the newly obtained parts of conformation space 
on the graph, as described before. When R g approaches 
2.46, the percolation transition occurs and there appears 
an "infinite" cluster. We mapped into the graph 1000000 
conformations from this cluster and searched for the long 
linear paths on the graph, as it was done in case of 18- 
mer. 

The longest paths we found for the 125-mer are or- 
ganized as follows. The conformations along the path 
transform into one another in a set of subsequent cor- 
ner flips in the bulk of compact conformation. When the 
first corner flip occurs, the vacant site opens producing 
the opportunity for another monomer to move. Then the 
similar process repeats many times, leading effectively to 
the vacancy traveling through the globule, finding the 
new corner flips on its way. These subsequent transfor- 
mations are demonstrated in Figure ^ which for the sim- 
plicity of presentation shows a smaller polymer, namely 
63-mer. 
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FIG. 4: A typical realization of the linear rearrangement 
path of the compact conformation of the lattice 63-mer. The 
vacancy travels through the bulk of the conformation in a 
set of subsequent corner flips. The bonds which participate 
in the conformation rearrangement are shown in bold color. 
The overlap between conformations on the opposite ends of 
the pathway comprises Q 1 ' 7 = 58, whereas maximal number 
of contacts Q max = 79. 



C. Design of the pathway in conformation space 

Now, when the mechanism of rearrangements within 
the compact conformation for the toy protein is clari- 
fied, we can propose the effective and straightforward 



approach to design the linear path of compact conforma- 
tions. 

The algorithm includes two main stages. First, we ar- 
range the switching elements along the path. Second, 
the rest of the polymer is computationally designed such 
that it is (almost) maximally compact and contains the 
necessary set of switching elements. As a result, we ob- 
tain one-dimensional path of compact conformations of 
the lattice toy protein. 

1. Building the switching elements. 

We build the linear path using flipping corners as 
switching elements. 

In the beginning of the procedure we have an empty 
cubic lattice. We start from choosing initial position of 
the vacancy (Figure ga, node (0,0,0)). The first flipping 
corner is drawn on the lattice in such a way that when 
it flips, the corner and the vacancy exchange their posi- 
tions. The edges forming next switching element should 
be drawn in such a way, that when it will flip, vacancy 
will hop to the former corner position opening the way 
for the next switching element. In the Figure ^|b two sub- 
sequent flipping corners are shown. After they both will 
move, the vacancy will take the position (1, 2, 1). Other 
flipping corners are drawn subsequently as many times 
as needed (Figure |]c) . 

At the same time as the switching elements arranged 
on the lattice we can control the linearity of the path. 
Every time when the new flipping corner is set on the 
lattice there are several ways to locate it relative to the 
vacancy position. We choose only one location, but the 
problem is that the other flipping corners may appear 
later, on the stage of the design of the whole conformation 
(see Figure ^d) . To prevent appearance of such parasitic 
switches we draw additional pieces of the conformation 
surrounding the pathway sites, as it is shown in Figure 
|e. 

2. Building the rest of conformation. 

We now have the set of switching elements, which are 
supposed to be just the disconnected pieces of the poly- 
mer. We have to find now the rest of the polymer, such 
that it fills (almost) completely the whole volume of the 
cube, and connects all switching elements into a linear 
chain. For this purpose, we developed computational 
method which is the modification of the approach pro- 
posed by Ramakrishnan et a£3 to generate maximally 
compact conformations. Here, we describe our modified 
algorithm. 

The conformation design starts with placing on the lat- 
tice the new edges connecting randomly chosen neighbor- 
ing vertices (Figure [6] a, b). This process soon brings us to 
the state, where some vertex cannot be connected to its 
randomly chosen neighbor, because the neighbor already 
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FIG. 5: The steps of the design of the linear rearrange- 
ment pathway, (a) The initial position of the vacancy and 
the first flipping corner are dawn on the lattice; (b,c) the 
other switching elements of the path are added; (d) possi- 
ble parasitic flips, which may later appear on the stage of 
the design of the whole conformation, are shown in grey; (e) 
to prevent parasitic flips additional elements of the conforma- 
tion surrounding the switching elements of the path are added 
(shown in yellow) ;(f) conformation design completed. 



FIG. 6: Tha-schematic representation of the application of 
the algorithms for the purposes of the design of the confor- 
mation incorporating the rearrangement pathway. For the 
simplicity the steps of the algorithm are shown in two di- 
mensions, (a) New edges randomly placed on the lattice; (b) 
Edges form subchains; (c) At some step edges can not be 
placed randomly; (d) Number of edges increases by one after 
applying two matching procedure; (e) Linear subchains an 
loops; (f) Subchains united. 



has two edges incident on it (Figure || c). In tiais case, 
special procedure called two-matching is appliednJ. Dur- 
ing this and the following steps of the algorithm, some 
randomly chosen edges can be removed from the lattice 
and changed by others. However, we impose the condi- 
tion that the edges forming switching elements cannot be 
removed at any step of conformation design. 

Two-matching starts from picking up the vertex P, 
which is either not connected, or has only one incident 
edge. Then its neighbor Q is chosen randomly as an 
opposite end of the new edge. If Q belongs to the linear 
subchain, then the ends of this subchain are checked for 
the possibility to be connected with their neighbors. If 
it can be done, a new edge (the edge ST at the figure 
^b) is added. One of the edges incident on Q is replaced 
by the edge PQ. Thus, in this procedure, the number 



of edges increases by one (Figure ||d). If, on the other 
hand, the vertex Q belongs to the looped subchain, one 
of its incident edges is removed and replaced by PQ. So, 
the loop is broken, and the total number of edges on the 
lattice remains unchanged. Typically, as a result of the 
work of this procedure, we obtain several looped and one 
linear subchain packed into the cubic lattice (Figure ^e). 

Now, the subchains should be merged into one chain. 
This is achieved in the following way. Suppose four neigh- 
boring vertices (K, L, M, N) form a square. The con- 
necting edges (K, L) and (M, N) belong to the different 
subchains. Excluding these edges and including instead 
(K, M) and (L, N), we would have merged subchains (see 
Figure ^|f). Such an operation is known as patching. 
Each patching operation transforms a pair of subchains 
into one subchain. The edges to be involved in patch- 
ing are chosen randomly. The process is stopped when 
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there is no more and no less than one linear chain on the 
lattice, which is the desired polymer conformation. 

In the original workEl, this method was applied to 
generate maximally compact lattice conformations. It is 
worth repeating that for our purposes we use this method 
starting from a complicated lattice which is the cube mi- 
nus elements chosen for switching elements. 

Generally, it is possible to use also crankshafts and 
flipping ends to design the switching elements. One just 
needs to forbid all the states of these elementary moves 
but two. This should be done by placing on the lattice 
additional edges (as it was done to prevent parasitic cor- 
ner flips), which restrict the extra states of these moves. 
However, the end flip can be used only twice as switch- 
ing element, because there are two ends of the chain. As 
regards the crankshaft move, it needs two vacant sites to 
make switching possible, which means the conformation 
in question should be slightly less compact. For these 
reasons, we use only corner flips as switching elements in 
this work. 



III. SEQUENCE DESIGN 

After we have chosen conformations, we should find 
the sequence which fits the target conformations with 
low energies. For this purpose the sequence of the model 
protein is annealed and Monte Carlo optimization in the 
sequence space is performed. The details of the algorithm 
are previously publishcdcH (see also review articlec3 and 
references therein for further details). It must be empha- 
sized that we plan to work with the sufficiently large set of 
monomer species; in fact, we shall even use the so-called 
Independent Interaction Model, in which the number of 
distinct species is as large as the number of monomers 
in the chain. This allows us to avoid difficulties well 
known in the case of sequence design .for the two-letter 
heteropolymers, such as the H P-modesEB. In the context 
of the present work, sequence design method had to be 
modified in two respects. First of all, we need not only 
one target conformation to have a low energy, as it is typ- 
ically assumed in protein folding simulations. We need 
the whole family of conformations - all conformations 
belonging to the rearrangement path - to have distinctly 
lower energies than all other states. Second, the more 
ambitious goal is to design the sequence in such a way 
that moving of the system along its pathway changes en- 
ergy in an orderly fashion. Since our rearrangement path 
is one-dimensional, we can pretend energy to increase 
monotonically along this path, in this sense making our 
toy protein a model of a molecular spring. 

Let us consider these two aspects of sequence design 
one by one. 



A. Sequence with multiple ground states 

The model protein is determined by the set of the co- 
ordinates of the monomers C = {ri} and the sequence of 
monomers seq = {sj} (the species sj £ {1, 2, ...q} denote 
the identity of each monomer, q is the total number of 
species), index / counts monomers along the chain. The 
Hamiltonian is written as follows 

N 

H(seq,C) = J2 B sis.A(ri - ?j) , (1) 

KJ 

where the energy of the conformation is determined by 
the matrix of species-species energies B^j for the con- 
tacting monomers and function A(r/ — fj) is defined such 
that it is equal to 1 if monomers /, J are lattice neigh- 
bors and otherwise. We use independent interactions 
model (IIM)El and Miyazawa-Jernigan (MJ)E3 matrices 
for Monte Carlo simulations in this article. 

In our approach the goal is to design sequences for 
which the whole set of conformations {Ck} have ener- 
gies sufficiently below that of the rest of conformation 
space. Of course, our candidates for the target states 
{Ck} are the conformations which belong to the previ- 
ously designed linear rearrangement path in conforma- 
tion space. These conformations are supposed to form a 
deep valley in the energy landscape. For the set of two 
target conformations, sequence design was performed in 
the paperEl In that work, the goal was to model pro- 
teins which can fold into two. (or more) distinct "native" 
conformations, like prionsEi Accordingly, two target 
conformations were chosen to be totally dissimilar (non- 
overlapping, or weakly overlapping) . In more details, 
such design was examined inEa. In our case, the prob- 
lem is almost the opposite. The conformations in ques- 
tion are very closely related, they can be mutually trans- 
formed into one another in just a few moves. Accordingly, 
the overlap between neighboring conformations along the 
path is very high. Of course, as the system walks along its 
pathway from one end to another, the overlap decreases, 
but still remains significant. For example, the overlap 
between the conformations at the opposite ends of the 
path shown in the Figure f| is as high as about 75%. 

The sequence optimization is governed by the following 
Hamiltonian: 

N c 

H dos (seq, {C k }) = 2 ^(seq, C k ) , (2) 

fe=i 

where Nc is the total number of conformations along the 
rearrangement pathway. 

The question which arises now is this. How efficient is 
the sequence optimization in the case of multiple closely 
related target states? Let us consider the simple sit- 
uation when only energies of two target conformations 
are optimized. These conformations, namely C\ and C2, 
could be, for example, the ends of the rearrangement 
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pathway. How deeply can their energies be lowered dur- 
ing the sequence design in comparison with an arbitrary 
other compact conformation C*? 

To make this estimate we can calculate the energy of 
the conformation C* averaged over the sequence space 



E se9 ^ e " (Wd05(s ° q,{Cfc}))/Td05 W(seq,C.) 



p£L-(H dcs ( S c q ,{C k }))/T dc 



(3) 

where Pseq — II/Li Psi ^ s the probability for the se- 
quences made randomly from independent monomer 
species with occurrence probabilities pi. The detailSj-of 
similar calculations are described in the review articled. 
The result for the present case of two target conforma- 
tions reads 

(E^seq)) = BQ* - °—[Q^ + Q 2 '*} , (4) 

J dcs 



where 



/ j 

ij 



SB 2 =J2Pi(Bij -B) 



Pi 



(5) 
(6) 



are the mean value and the variance of the interaction 
matrix. 



N 
KJ 



(7) 



is the total number of contacts in the conformation C* 
and its overlap with arbitrary target conformation Ck is 
denned as 

N 

= £ A(r1 - r*)A(r? - fj) . (8) 

KJ 

As we can see from the expression (Q), the energy of 
the designed sequence in the conformation C* depends on 
the similarity of this conformation to the target states. 
This similarity is measured by the overlap parameter 
+ Q 2 '*]. It takes the maximal value for the a given 
pair C\, C2, when C* coincides with either C± or Ci- Usu- 
ally, if C* lies on the rearrangement pathway, the overlap 
parameter [Q 1 * + Q 2 '*] takes values close to the maxi- 
mum. Therefore, not only energies of the target confor- 
mations are optimized, but conformations between them 
(e.g. along the designed linear path!) are optimized, too. 

In principle, sequence design method may also lower 
the energies of other states, which are related to the 
target states, but do not belong to the rearrangement 
pathway, and thus are non desirable for us here. This is 
a well known problem, generally addressed through the 
"negative design" (see, for instance,E3). Luckily, in our 
specific case, conformation design, as discussed above, 



helps to address this problem. Indeed, since there are 
no allowed conformations at all on the sides of the path- 
way, the only possible low energy decoys are structurally 
unrelated ones. 

Sequence design method employed hece may seem to 
contradict the results of the recent worked. In this work, 
authors estimated the maximal possible number of "na- 
tive" states which may be " memorized" by the sequence. 
They showed that this number is very limited, it is in- 
dependent on protein length, and is fully determined 
by the alphabet - the number, q, of distinct monomer 
species (for instance, for the system with q — 20 monomer 
species, there can be not more than four or five "native" 
states). lB,fact, there is no contradiction. The estimate of 
the workEj determines the maximal number of unrelated 
conformations which can be designed into the sequence. 
In our case, all target states are very closely related, and 
they in fact belong to the same potential well, or funnel, 
in the free energy landscape. They, of course, cannot be 
considered as independent. As we show below, our se- 
quence design process works successfully even when the 
number of states in the rearrangement path is as large as 
about ten. 



B. Sequence design with energy gradient along the 
rearrangement path 

So far, we have been discussing the sequence design 
procedure for which alLjthe target states Ck were equal. 
Now, following OrwelEa, we shall consider some of the 
conformations more equal than others. Specifically, we 
should remember that conformations Ck form a one- 
dimensional path. We did it on purpose, and we should 
use it now. To be specific, let us assume that conforma- 
tions Cfc are labeled with index A: in a natural way, such 
that k changes orderly from 1 at the one end of the rear- 
rangement pathway to the maximal value Nc at the other 
end. Then, we shall require that, say, C\ has the lowest 
energy, C2 has energy a little higher - preferably, by a cer- 
tain amount W higher; C3 we want to be about W higher 
in energy than C2, ... , and this continues all the way up 
to Cm c , which we want to be about (Nc — l)W above C\, 
but still much lower than all the other conformations. 

Why do we want such energy landscape? First of all, 
being all connected in one valley, the conformations Ck 
form together a basin of attraction for folding, or fold- 
ing funnel. Second, when correctly folded and at the 
bottom of the funnel, the system can still travel back 
and forth along the one-dimensional rearrangement path. 
This travel may be either due to fluctuations or due to 
some externally applied force. 

To achieve this aim we modified the design Hamilto- 
nian (^|) in the following way: 

N c 

H d os(seq, {Cfc}) = ^2 ^(seq, C k ) + 
fc=l 
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N c 

A^[H(seq,C fc 



W(seq,Ci) - kW]' 



(9) 



k=2 



where A is the "experimentally" adjusted parameter, and 
W is the desired energy gap between neighboring con- 
formations. The Monte-Carlo optimization ruled by the 
Hamiltonian (^J) has a bias towards sequences with lower 
energy in conformation C\ and subsequently higher ener- 
gies in conformations C2, C3, .... 

Various regimes are possible here depending on the re- 
lation between the design temperature Td es , real temper- 
ature T, and the value of W. Not entering the discussion 
of all these regimes, we mention that in what follows we 
have chosen the sequence optimization temperature to be 
T dcs = O.ldB. 



IV. RESULTS 
A. Design of the conformations and the sequence 

To demonstrate the work of our design method, we 
generated two sets of confor matio ns of lattice 63-mer as 
described above in the section [[I Cj . The linear rearrange- 
ment path of the conformation shown in Figure ^ includes 
7 conformations. In another example, shown below in 
Figure [l(], there are 6 conformations in the path. The 
location of the switching elements in these two cases is 
chosen differently. Switching elements in the former case 
are located in the bulk of the globule, whereas in the lat- 
ter case the switching elements are all located near the 
surface. 

For these two sets of target conformations, the se- 
quence optimization procedure was applied. The values 
of parameters were as follows: W — 0.35B, Td cs = 0.15B, 
A = 15, and the interaction matrix was chosen to be 
correspond to the independent interactions model (IIM) . 



B. Folding 

First of all, we have to check that the chains can cor- 
rectly fold into the conformation as designed. We used 
the set of conformations shown in the Figure 4[| as tar- 
get states. We compared folding rates for the sequence 
designed as proposed in this article and for the con- 
trol sequence, designed in a more traditional way, with 
conformation C± as the only purported ground state. 
All the folding experiments were started from different 
random unfolded conformation. The Monte Carlo sim- 
ulations were performed at temperatures in the range 
T = 0.7 -7- 0.9T m , where T m « 1.3SB was the midpoint 
temperature. Mean first passage time (MFPT) at every 
temperature was calculated by averaging over 30 folding 
runs. The results are shown in the Figure @. The folding 
times for the sequence which has multiple ground states 
are approximately 3 times longer than for the sequence 



with the unique native conformation. Further inspec- 
tion suggests that this happens because the depth of the 
global minimum for the sequence with multiple ground 
states can not be as well optimized as that for the se- 
quence with unique ground state. Nevertheless, the em- 
phasis of our result here is on the good news, not the bad 
ones: it is not important that our sequences are slower, 
it is important that they are insignificantly slower, only 
by a factor of about 3 slower. That means, they do fold, 
and their folding time is of the same order of magnitude. 

It is worth to emphasize that during this folding ex- 
periments, the chain was not confined in any restricted 
volume. We used volume restriction in the preliminary 
stage of this work, to elucidate the method of conforma- 
tion design. Now, as we are done with the design, we let 
the polymer do whatever it wants, and the result is that 
it folds and spontaneously arrives into the valley where 
it has a linear chain of conformations at its disposal. 




0.9 0.95 1.0 1.05 1.1 

Temperature 

FIG. 7: The comparison of the folding rates of the two se- 
quences. First sequence (squares) is optimized to have unique 
ground state , which is the conformation 1 in the Figure 4. 
Second sequence (triangles) is designed according to the ap- 
proach described in this paper and has 7 low energy confor- 
mations (all are shown in the Figure 4. The same IIM ma- 
trix with (B) — —0.025, SB = 1.0 used for the design and 
folding. Both sequences were designed at the temperature 
T dcs = 0.1.SB. 



C. Diffusion along the designed path 

Now, since we have established that our model does 
fold, we have to check if it can move along the designed 
path. In reality, conformational relaxation of a function- 
ing protein machine is triggered by the attachment or 
detachment of a substrate or other ligand; we shall con- 
sider this in the next section. Here, we want to perform 
the simpler test to see what happens without any stimuli. 
In this situation, we expect our toy protein to move ran- 
domly back and forth along the designed path. It should 
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be mostly in the lowest energy state C\ , but since the en- 
ergy difference between states along the path W — 0.3SB 
is only a fraction of thermal energy, bias towards the C\ 
end should be relatively weak, there should be plenty of 
fluctuations along the path. The point to be checked 
is that the system, while performing this random walk 
along the path, should not open up too frequently, the 
globule should stay compact. 

To examine how this happens in the toy-protein shown 
in the Figure [|, we run a long Monte Carlo simulations at 
different temperatures starting from the conformation C\ . 
The events of passing the conformations {Ck} of the pre- 
designed path are recorded. Coordinate along the path- 
way takes the value x — k if the vacancy position along 
the path coincides with conformation Ck ■ A typical " tra- 
jectory" of conformational changes for the first designed 
toy protein is shown in Figure g[ The inset of Figure ^ 
displays the details of one particular passage along the 
path from conformation C\ to C7 and back. The events 
when conformation changes in a way other than walking 
along the path (say, some loop opens on the surface of the 
globule), are pictured as the change of the coordinate in 
the perpendicular plane. As one can see, below the mid- 
point temperature the toy-protein stays steadily on the 
rearrangement path and makes random walks back and 
forth along it, with very limited fluctuations in the mul- 
titude of transverse directions in the conformation space. 




MC steps 

FIG. 8: A typical MC trajectory of the toy-protein shown in 
the Figure W at the ground state. The dependence of energy 
on the number of MC steps; (Inset) The dependence of the 
conformational coordinates on the number of MC steps. Co- 
ordinate along the linear rearrangement path is shown in the 
vertical plane. The deviations from the linear path confor- 
mations a shown schematically as the non-zero values in the 
horizontal plane. These deviations are related to the events 
of unfolding of some loops on the surface of the molecule. 

Thus, what we observe confirms that our toy pro- 
tein performs thermal fluctuations in the form of one- 



dimensional movement along the designed path. This is 
important, because if fluctuations occur that way, that 
means, under some different circumstances, e.g., applied 
external force, the system can perform also a forced 
movement along that same pathway, as it is expected 
for the function. Thus, we need to test carefully that the 
observed fluctuations are indeed the random walks effec- 
tively in one dimension. To address this quantitatively 
we calculate the correlation function (x(t)x(t + r)) for 
the toy protein and compare it with that for the artificial 
auxiliary random walker which diffuses on the ID dis- 
crete set of states with the same energy profile as for the 
protein. The good agreement between the two correla- 
tion functions is evident in Figure ^[ Hence the designed 
toy protein does indeed move along the one-dimensional 
pathway and in this sense it remembers its "function." 



2,0- 



1,5- """"-„. 




T 



FIG. 9: The correlation function of the random walk along 
the linear path. The simulation results (T = 0.55B - triangles; 
T = 0.78B - squares) are compared with the curves calculated 
for the particle performing one-dimensional walk at the same 
potential and at the same temperatures (T = 0.55B - solid 
line; T = 0.78 B - dots). 



D. Simulation of the binding/dissociation with the 
small molecule 

In the previous section we demonstrated that toy- 
protein can use the designed rearrangement path, per- 
forming a biased, but random motion. This biased walk 
can be triggered if the protein molecule is externally stim- 
ulated. As in the real biochemical world, this can be most 
efficiently done by attaching the ligand molecule to the 
protein globule. In this sense, the typical cycle of a pro- 
tein work may be described as followsu. 

We start from the protein molecule in its ground state. 
At some moment, a ligand molecule gets attached to the 
protein. With the ligand attached, protein conformation 
is no longer the ground state, some other conformation 
is now lower in energy, and, therefore, attachment of lig- 
and initiates the relaxation process of the globule into 
the new conformational state, which is the ground state 
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for the protein-plus-ligand complex. When this relax- 
ation is completed (or nearly completed), the new con- 
formation turns out well suitable for certain chemical (or 
other small length scale) changes, the result of which for 
the protein is the desorbtion of a ligand initiating again 
the relaxation process, this time back into the original 
ground state. This type of conformational relaxation 
processes coupled with the ligand binding plays is well 
known for motor proteinsEj, heme-containing proteinsEj, 
etc. In this work, following our general strategy, we want 
to design a toy ligand with which our toy protein can 
perform the entire cycle of its "function." 

We designed a toy protein which has binding site and 
is able to change its shape. The initial protein confor- 
mation and conformational changes activated by the ad- 
sorption of the toy ligand are shown in Figure [h]. The 
toy protein was designed as explained above. The linear 
rearrangement path of the molecule includes 6 conforma- 
tions. The conformation C\ is the lowest energy state for 
the protein with no ligand. The energies of interaction 
of the ligand with the " active center" of the protein are 
chosen such that as soon as the complex forms, the ener- 
gies of linear path conformations change in comparison 
with those in isolated protein molecule. The profile of 
the energy changes along the path of conformational re- 
arrangements in the protein-ligand complex is shown in 
Figure [ll]. When the ligand is attached, the conformation 
Cq of the path has the lowest energy. Hence as the small 
molecule binds to the protein in the state C%, it induces 
the cascade of conformational changes which drives the 
protein to the conformation Cg. The energy profile along 
the path of the conformational changes corresponds to 
the case of the suicide inhibitor due to the high barrier 
for the dissociation of the ligand in the conformation C§. 




a) b) 

FIG. 10: Two conformations of the toy-protein represent- 
ing the ends of the linear rearrangement path and the small 
molecule bound to the active center of the protein (a). The 
substrate binds when the protein stays in the conformation 
Ci and unbinds at the conformation Ce (b). 

Thus, our model exhibits the conformational response 
to the ligand binding. When the ligand binds, it jump- 
starts the cascade of orderly occurring conformational 
transitions C\ — > Ci — *• . . . — > Cq. In the computer exper- 



iment, we have measured the relaxation times - the num- 
ber of Monte Carlo steps in which the toy protein goes 
from Ci into Cq upon ligand binding. We repeated this 
measurement in 100000 independent MC runs, and Fig- 
ure^ shows the distribution of relaxation times. The tail 
of the distribution demonstrates exponential decay that 
is expected for the biased diffusion in one-dimensional 
system. 




FIG. 11: The energies of the conformations along the path 
in the protein molecule and in the complex. 

These simulation results are in good agreement with, 
some experimental data. For example, Beece et. al.Eij 
studied the rebinding of carbon monoxide to myoglobin 
protein after dissociation induced by 1 fis laser impulse. 
Such a short pulse synchronized conformational trans- 
formations of many protein molecules in solution. Under 
proper circumstances, the population of excited protein 
molecules exhibited exponential decay, quite similar to 
our data presented on Figure [l2| To reproduce also the 
multi-exponential or power law decay oa-the medium 
time scale observed in some experimentscJ will require 
some modifications of the present model. 

V. CONCLUDING REMARKS 

In this study, we designed a toy model which exhibits 
quite a few protein-machine- like properties. First and 
foremost, it has the possibility to move like a mechanical 
system along the one-dimensional path, which we called 
rearrangement path. Like real protein, our model can 
fold into the valley of conformations the bottom of which 
is the rearrangement path. Thus, it folds, but not only 
it just folds, it does so to form a functional, movable 
state. While folded into its low energy valley, our toy 
protein can move along the path which is the valley bot- 
tom. Making only few transverse excursions, it performs 
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FIG. 12: The distribution of the relaxation times of the toy 
protein in response to the events of ligand binding. The MC 
simulations were performed for the sequence designed in MJ 
model, (B) = -1.78, SB = 1.0, T dcs = 0.L5B, T = 0ASB. 

a biased random walk along this path. Attachment and 
detachment of the ligand can switch the direction of the 
bias, thus making toy protein to go orderly through the 
closed cycle mimicking protein function. 

The usefulness of a toy is the possibility to play. Play- 
ing with our toy allows to gain some insights into the 
general concept of machine-like function of proteins. One 
question which we can discuss is that of storing energy 
in the deformed globule. Our model is designed in such 
a way that the energy of the globule at one end of the 
rearrangement path is higher than at the other - see, e.g, 
figure [ll], left panel. Can we say this globule acts as 
a molecular spring? To begin with, this seems to con- 
tradict the formulation of the design Hamiltonian which 
purports to make energy linear in the coordinate k along 
the path (kW), while regular Hookean spring, of course, 
should have energy quadratic in deformation. In fact, 
this is unimportant, because design Hamiltonian can be 
easily modified to purport quadratic (or any other) en- 
ergy profile; only discrete character of the lattice renders 
such fine tuning of the model useless. More importantly, 
all energies involved in our model are in fact free ener- 
gies. It should be understood that proteins, although 
machines, are not heat engines: they function in essen- 
tially isothermic environment. Therefore, when we say, 
for instance, that contact of monomers i and j has en- 
ergy SBij , we should have in mind that pre-averaging has 
been performed over a multitude of small scale, rapidly 
relaxing degrees of freedom (e.g., X' an E^ es °f the residues 
side groups), and then SBij is the free energy of the cor- 
responding contact. From that point of view, we can say 



that at the upper-energy end of the rearrangement path 
our model protein stores free energy rather than energy. 
In other words, it acts not so much like a Hookean spring, 
but rather like a piece of rubber. This analogy gets even 
better if we remember that linearity of the strain-stress 
relation and reversibility of deformation are totally un- 
related in the case of rubber. As in other mechanical 
devices, the question of reversibility in our toy machine 
is related to its speed: the slower is the rate of oper- 
ation, the closer it can approach to the limit of being 
reversible. Of course, when our toy protein undergoes 
fluctuations along its designed path, it exchanges energy 
with the surrounding heat bath, because the process oc- 
curs at the constant temperature. When conformational 
relaxation takes place, because random walk along the 
path is strongly biased - then energy exchange with ther- 
mal bath is dominated by the energy transfer into the 
bath. However, when we imagine that ligand energy has 
driven the enzyme to the high energy end of its path, 
then this energy remains available and drives the subse- 
quent conformational relaxation. In this delicate sense, 
our toy can be said to work like a molecular spring - or, 
once again, as a piece of rubber. It goes without saying 
that this is an overdamped spring (or rubber), no oscil- 
lations arc in question in the course of conformational 
relaxation. 

Our approach is, no doubt, very schematic. It uses to 
the full extent the discrete geometry of moves on the cu- 
bic lattice. This is the heavy price we have to pay for the 
tractability of the model. In our opinion, the wonderful 
properties which we found justify the study of this model. 
We hope also that the fundamental ideas behind our ap- 
proach will be useful for the more realistic models, in- 
cluding off-lattice ones. These fundamental ideas are the 
search for a one-dimensional rearrangement pathways in 
the compact globule and the design of sequences capable 
of folding into a "collective funnel" formed together by 
all states belonging to this pathway. Design of the static 
protein backbonepGpnformationEj, as as sequence de- 
sign for the staticcll limited flexibilitytS backbones are all 
familiar computational approaches. It is a challenge to 
incorporate the search for the one-dimensional rearrange- 
ment pathways into this already difficult area. 
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